Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Image search example #5

Merged
merged 12 commits into from
Jul 8, 2023
Merged

feat: Image search example #5

merged 12 commits into from
Jul 8, 2023

Conversation

Green-Sky
Copy link
Collaborator

@Green-Sky Green-Sky commented Jun 17, 2023

I used the usearch vector database library, to build a image embedding database and then to search in it by similarity using a text string.

build the database:

$ bin/image-search-build -m ../models/ggml-model-f16.bin ../tests/
clip_model_load: loading model from '../models/ggml-model-f16.bin' - please wait....................................................clip_model_load: model size =   288.93 MB / num tensors = 397
clip_model_load: model loadded
main: starting base dir scan of '../tests/'
main: found image file '../tests/white.jpg'
main: found image file '../tests/red_apple.jpg'

it generates an images.usearch and images.path file.

search by similarity to string:

$ bin/image-search -m ../models/ggml-model-f16.bin "a red apple"
clip_model_load: loading model from '../models/ggml-model-f16.bin' - please wait....................................................clip_model_load: model size =   288.93 MB / num tensors = 397
clip_model_load: model loadded
search results:
similarity path
  0.655244 /home/green/workspace/clip.cpp/tests/red_apple.jpg
  0.773273 /home/green/workspace/clip.cpp/tests/white.jpg

It is a very rough implementation. It requires c++17 and std::filesystem for iterating the filesystem.

A similarity search via an image would be nice too.

basically we could copy all the features from here: https://github.com/yurijmikhalevich/rclip

TODO:

@Green-Sky Green-Sky force-pushed the image-search branch 2 times, most recently from 9553d20 to 77622a7 Compare June 17, 2023 18:55
Copy link
Owner

@monatis monatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea to provide an image search example with tiny vector search implementation directly in C++, but I highlighted some points.

Also, let's not use tab characters for indents please.

general code clean up. but the rest of the repo is still in a similar state 😝

Things may be messy at early stages to quickly introduce new features, but I'm trying to tidy up the code and README as we make progress in the project. You can always suggest better structures for the code and the overall project.

examples/image-search/usearch/index.hpp Outdated Show resolved Hide resolved
examples/image-search/build.cpp Outdated Show resolved Hide resolved
examples/image-search/build.cpp Outdated Show resolved Hide resolved
@Green-Sky
Copy link
Collaborator Author

Green-Sky commented Jun 18, 2023

Also, let's not use tab characters for indents please.

Ohhhhh yea. sorry. forgot to reverse this 😆

@Green-Sky
Copy link
Collaborator Author

Things may be messy at early stages to quickly introduce new features, but I'm trying to tidy up the code and README as we make progress in the project. You can always suggest better structures for the code and the overall project.

yea, I also intentionally maked this PR as draft for those reasons. :)

@monatis
Copy link
Owner

monatis commented Jun 18, 2023

yea, I also intentionally maked this PR as draft for those reasons. :)

Great, thank you. Your work really appreciated. And to be honest, I was not aware of Usearch, but instead considering the original hnswlib library for the search example. But Usearch looks like a better candidate.

@Green-Sky
Copy link
Collaborator Author

forcepushed with FetchContent and spaces instead of tabs.

@Green-Sky Green-Sky force-pushed the image-search branch 5 times, most recently from 2385eaa to 7d170cf Compare June 25, 2023 13:50
@Green-Sky Green-Sky force-pushed the image-search branch 3 times, most recently from 924cc29 to dfad626 Compare June 26, 2023 19:56
@monatis
Copy link
Owner

monatis commented Jul 3, 2023

Do you need an extra hand in this?

@Green-Sky
Copy link
Collaborator Author

Was doing something completely different last week :)
Will try to get this into a mergeable state this week.

@Green-Sky
Copy link
Collaborator Author

just rebased and changed some stuff you committed on main

issues that I found:

  1. you added cmake options for conditionally disabling tests and examples, i negated them and initialized them to standalone (more convenient)
  2. you require c++20 (!) should only be 11 i think. (and 17 for fs in image-search until its changed to something else)
  3. you set gcc/clang specific warning and O3 optimization level indiscriminately
    add_compile_options(-Wno-format -O3)
    (you should not, you default to Release
    set(CMAKE_BUILD_TYPE Release CACHE STRING "Build type" FORCE)
    , which sets O3)

@monatis
Copy link
Owner

monatis commented Jul 7, 2023

Yes CMakeLists.txt still requires some testing and optimizations (also for other arches like different arm versions) so it might be another PR when I take down my Raspberry PIs from the shelf. C++20 standard was just a guard but we can downgrade it now or after some tests. I can refactor the directory walk with the function from common-clip that does not require C++17. Then everything will be ok with C++11.

I'm ok with it as is, and then we can continue to further improve it. When #30 is merged, we can also the use batch inference when indexing images.

@Green-Sky
Copy link
Collaborator Author

Yes CMakeLists.txt still requires some testing and optimizations (also for other arches like different arm versions) so it might be another PR when I take down my Raspberry PIs from the shelf. C++20 standard was just a guard but we can downgrade it now or after some tests. I can refactor the directory walk with the function from common-clip that does not require C++17. Then everything will be ok with C++11.

yea, there have also been changes upstream (especially llama) , where you obviously copied based the code on.

I'm ok with it as is, and then we can continue to further improve it. When #30 is merged, we can also the use batch inference when indexing images.

Yea. Let me run some small tests and then i will make it ready for review.

@monatis
Copy link
Owner

monatis commented Jul 7, 2023

yea, there have also been changes upstream (especially llama) , where you obviously copied based the code on.

Exactly. The entire ggml-based projects are moving very fast, so I believe we don't need to target 100% production-grade multiplatform-ability. The speed of iterations and innovations is prioritized currently.

Yea. Let me run some small tests and then i will make it ready for review.

Great. Thank you.

@Green-Sky Green-Sky marked this pull request as ready for review July 8, 2023 00:44
@monatis
Copy link
Owner

monatis commented Jul 8, 2023

USearch reports distance values instead of similarities, so I made changes to emphasize it. LGTM, merging. TODOs can be fixed later.

Thanks for your contributions.

@monatis monatis merged commit 774c46a into monatis:main Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants